Goto

Collaborating Authors

 contrastive search


The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Arias, Esteban Garces, Sapargali, Nurzhan, Heumann, Christian, Aßenmacher, Matthias

arXiv.org Machine Learning

Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based decoding. We hypothesize this contributes to the detectability of machine-generated text. Analyzing over 1.8 million texts across eight language models, five decoding strategies, and 53 hyperparameter configurations, we find that 8-18% of human-selected tokens fall outside typical truncation boundaries. Simple classifiers trained on predictability and lexical diversity achieve remarkable detection rates. Crucially, neither model scale nor architecture correlates strongly with detectability; truncation parameters account for most variance. Configurations achieving low detectability often produce incoherent text, indicating that evading detection and producing natural text are distinct objectives. These findings suggest detectability is enhanced by likelihood-based token selection, not merely a matter of model capability.




Context-Enhanced Contrastive Search for Improved LLM Text Generation

Sen, Jaydip, Pandey, Rohit, Waghela, Hetvi

arXiv.org Artificial Intelligence

--Recently, Large Language Models (LLMs) have demonstrated remarkable advancements in Natural Language Processing (NLP). However, generating high-quality text that balances coherence, diversity, and relevance remain s challenging. Traditional decoding methods, such as bean search and top-k sampling, often struggle with either repe titive or incoherent outputs, particularly in tasks that require long-form text generation. To address these limitations, the paper proposes a novel enhancement of the well-known Contrastive S earch algorithm, Context-Enhanced Contrastive Search (CEC S) with contextual calibration. The proposed scheme introduces several novelties including dynamic contextual importance w eighting, multi-level Contrastive Search, and adaptive temper ature control, to optimize the balance between fluency, creativity, and precision. The performance of CECS is evaluated usi ng several standard metrics such as BLEU, ROUGE, and semantic similarity. Experimental results demonstrate signif icant improvements in both coherence and relevance of the generated texts by CECS outperforming the existing Contrastive Search techniques. The proposed algorithm has several pote ntial applications in the real world including legal document drafting, customer service chatbots, and content marketing. In recent years, Large Language Models (LLMs) have transformed the field of Natural Language Processing (NLP), delivering cutting-edge performance across numerous tasks, including text generation, summarization, machine translation, and question answering. Models such as OpenAI's GPT-3 [1], Google's BERT [2], and more recently PaLM [3], have greatly enhanced the capabilities of machines in understanding and generating human language. By leveraging deep neural network architectures and training on extensive datasets, LLMs have made significant strides in pro ducing fluent and coherent text that closely resembles hum an communication. Generating text from an LLM involves more than simp ly predicting the next word in a sequence according to its probability distribution. This step, known as decod ing, plays a critical role in shaping the final output. Various decoding strategies have been proposed in the literature ranging from deterministic methods such as beam search, to stoch astic methods like top-k and nucleus sampling. While the deterministic methods choose the highest probability token at each step, their stochastic counterparts introduce randomness to improve diversity in the generated output.


Decoding Decoded: Understanding Hyperparameter Effects in Open-Ended Text Generation

Arias, Esteban Garces, Li, Meimingwei, Heumann, Christian, Aßenmacher, Matthias

arXiv.org Artificial Intelligence

Decoding strategies for generative large language models (LLMs) are a critical but often underexplored aspect of text generation tasks. Guided by specific hyperparameters, these strategies aim to transform the raw probability distributions produced by language models into coherent, fluent text. In this study, we undertake a large-scale empirical assessment of a range of decoding methods, open-source LLMs, textual domains, and evaluation protocols to determine how hyperparameter choices shape the outputs. Our experiments include both factual (e.g., news) and creative (e.g., fiction) domains, and incorporate a broad suite of automatic evaluation metrics alongside human judgments. Through extensive sensitivity analyses, we distill practical recommendations for selecting and tuning hyperparameters, noting that optimal configurations vary across models and tasks. By synthesizing these insights, this study provides actionable guidance for refining decoding strategies, enabling researchers and practitioners to achieve higher-quality, more reliable, and context-appropriate text generation outcomes.


Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation

Arias, Esteban Garces, Rodemann, Julian, Li, Meimingwei, Heumann, Christian, Aßenmacher, Matthias

arXiv.org Machine Learning

Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, $k-$sampling, nucleus $p-$sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks. Our code base, datasets, and models are publicly available.


Duwak: Dual Watermarks in Large Language Models

Zhu, Chaoyi, Galjaard, Jeroen, Chen, Pin-Yu, Chen, Lydia Y.

arXiv.org Artificial Intelligence

As large language models (LLM) are increasingly used for text generation tasks, it is critical to audit their usages, govern their applications, and mitigate their potential harms. Existing watermark techniques are shown effective in embedding single human-imperceptible and machine-detectable patterns without significantly affecting generated text quality and semantics. However, the efficiency in detecting watermarks, i.e., the minimum number of tokens required to assert detection with significance and robustness against post-editing, is still debatable. In this paper, we propose, Duwak, to fundamentally enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. To mitigate expression degradation caused by biasing toward certain tokens, we design a contrastive search to watermark the sampling scheme, which minimizes the token repetition and enhances the diversity. We theoretically explain the interdependency of the two watermarks within Duwak. We evaluate Duwak extensively on Llama2 under various post-editing attacks, against four state-of-the-art watermarking techniques and combinations of them. Our results show that Duwak marked text achieves the highest watermarked text quality at the lowest required token count for detection, up to 70% tokens less than existing approaches, especially under post paraphrasing.


Fine-grained Conversational Decoding via Isotropic and Proximal Search

Yao, Yuxuan, Wu, Han, Xu, Qiling, Song, Linqi

arXiv.org Artificial Intelligence

General-purpose text decoding approaches are usually adopted for dialogue response generation. Although the quality of the generated responses can be improved with dialogue-specific encoding methods, conversational decoding methods are still under-explored. Inspired by \citet{wu2023learning} that a good dialogue feature space should follow the rules of locality and isotropy, we present a fine-grained conversational decoding method, termed \textit{isotropic and proximal search (IPS)}. Our method is designed to generate the semantic-concentrated response, while still maintaining informativeness and discrimination against the context. Experiments show that our approach outperforms existing decoding strategies in the dialogue field across both automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our approach.


Fidelity-Enriched Contrastive Search: Reconciling the Faithfulness-Diversity Trade-Off in Text Generation

Chen, Wei-Lin, Wu, Cheng-Kuang, Chen, Hsin-Hsi, Chen, Chung-Chi

arXiv.org Artificial Intelligence

In this paper, we address the hallucination problem commonly found in natural language generation tasks. Language models often generate fluent and convincing content but can lack consistency with the provided source, resulting in potential inaccuracies. We propose a new decoding method called Fidelity-Enriched Contrastive Search (FECS), which augments the contrastive search framework with context-aware regularization terms. FECS promotes tokens that are semantically similar to the provided source while penalizing repetitiveness in the generated text. We demonstrate its effectiveness across two tasks prone to hallucination: abstractive summarization and dialogue generation. Results show that FECS consistently enhances faithfulness across various language model sizes while maintaining output diversity comparable to well-performing decoding algorithms.


Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

Bonaldi, Helena, Attanasio, Giuseppe, Nozza, Debora, Guerini, Marco

arXiv.org Artificial Intelligence

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.